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Abstract: We apply two variations of the principle of Minimum Cross Entropy 
(the KuUback information measure) to fit parameterized probability density mod- 
els to observed data densities. For an array beamforming problem with P incident 
narrowband point sources, N > P sensors, and colored noise, both approaches 
yield eigenvector fitting methods similar to that of the MUSIC algorithm [1]. Fur- 
thermore, the corresponding cross-entropies are related to the MDL model order 
selection criterion[2]. 
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1. Introduction 

Many existing high resolution methods for spectral analysis and for optimal 
beamforming utilize covariance matrices estimated from observed data. Often, 
an underlying structure for the covariance matrix is known in advance, and our 
goal is to estimate the covariance matrix with this structure which best fits the 
observed data. Previous literature has suggested a variety of methods of opti- 
mally estimating structured covariance matrices from data[3,4,5]. In this paper, 
we will apply the minimum cross entropy (CE)[6,7] and minimum reverse cross- 
entropy (RCE)[6] principles to estimate the covariance matrix. These principles 
have proved to be quite powerful in a wide variety of signal processing applica- 
tions[8,9] and have been justified as being "optimal" under suitable assumptions. 
In section 2, we apply the CE and RCE procedures to the problem of estimating 
structured covariance matrices, and in section 3 we demonstrate the utility of 
the idea for a beamforming application. 
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2. Problem Statement 

Let X be an N-dimensional real or complex random vector. Assume that a 
Gaussian probability density for x is either known a prior, or has been estimated 
by some procedure from observed data: 

p{x) = N{m, R) (1) 

where m is the expected value of x, and R is the covariance matrix, R = E\xx^], 
and where x^ is the Hermitian (complex conjugate transpose) of x. Suppose we 
wish to approximate this p{x) with a parameterized probability density function 
(PDF): 

qe{x) = N{rrig,Rg) (2) 

where 6 denotes the unknown parameters 6 in the model qeix) which are to be 
estimated. Conceptually, we wish to choose 6 to make qe{x) optimally match 
p{x). An appropriate objective function is the KuUback information measure[6], 
otherwise known as the Minimum Cross- Entropy principle[7]. Because this mea- 
sure is asymmetric, we can apply it in two different ways to this problem. Fol- 
lowing[8,9,10] we call these the "Cross-Entropy" and "Reverse Cross-Entropy" 
methods: 

CE : qe^ mm H{q0,p) (3) 
RCE : qe^ mm H{p,q0) (4) 

where: 

H{puP2)= [pi{x)log^dx (5) 

KuUback [6] has argued that H{pi,p2) measures the mean amount of information 
for discriminating in favor of the hypothesis that pi is the correct density of x 
rather thanp2- Shore and Johnson [7] have argued that minimizing H{pi,p2) over 
pi is the only consistent estimation procedure for estimating a PDF given an a 
prior density estimate P2{x) combined with new structural information about the 
density, such as one or more of its moments. The measure H{pi,p2) has several 
pleasing mathematical properties: it is convex inpi, and convex 'mp2, and attains 
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its minimum value of zero when Pi{x) = P2i-L) almost everywhere. Another useful 
property is that estimating 6 from either (3) or (4) is straightforward. Substitute 
(1) and (2) into the CE and RCE formulas to obtain: 

CE : H{q0,p) =^{tr{R-^Re) - N -\og\R~^R0\+{mg-m)"R~^{mg-m)} 
RCE : H{p, qe) = ^{tr{Rg^R) - AT - log \Rg^R\ + {m - Ek)" Re^m - Ek)} 

where ^ = 1/2 when x is real and ^ = 1 when x is complex. 

To simplify the remainder of the discussion, assume that the mean is known, 
m o =rn, so that we can focus on the estimation of the covariance matrix and 
compare the results with those by Burg and Gray [4] and Gray, Anderson, Sim[5]. 
The two estimation problems reduce to minimizing: 

CE : H{qe,p)=C{triR-^Re)-N-log\R-^Re\} (6) 
RCE : H{p,qe)=atr{Rg^R)-N-log\Rg^R\} (7) 



Setting the gradients of the above two objective functions with respect to 9 
to zero, we obtain the necessary conditions that 6 be the optimal solution: 



dRn^ 1 







RCE: tr{{R-Rg)- 



= 



(8) 
(9) 



for all i, where 9i is the z*'* element of 9_. When Rq is invertible and differentiable 



in 9: 



dRa^ 



(10) 



^e^ ' 89, 

Substituting this into the RCE formula gives an alternate set of necessary con- 
ditions for the optimal RCE solution: 



RCE: trl^iRg'RRg'-Rg')^^ 



= 



(11) 
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3. Application to Array Beamforming 

In this section we will apply the CE and RCE methods to fitting a low 
rank plus noise covariance matrix to data. Such problems arise in a variety of 
contexts, including narrowband sensor array processing and harmonic retrieval. 
We focus on the former problem. Let x[n] = {xi[n], ...,XN[n])'^ be a vector 
of sensor measurements at time n, where N is the total number of sensors in 
the array. Assume that the signal is narrowband (perhaps because the sensor 
data has been preprocessed through a Fast Fourier Transform of each sensor's 
data). Let our initial PDF estimate for the data be given by = N{0,R), 

where R is any non-parameterized estimate of the signal covariance, such as 
R = ^ ^k=i ^ [^] [^] where K snapshots of array data are used. 

Now suppose we wish to model the data x[n] as: 

p 

x[n] = ^ SiMMi + <Jw[n] (12) 

i=l 

where si[n], sp[n] are P source signals, P < N, 
arriving from unknown directions ui, ■■■,Up, with additive noise w[n\ with gain 
a. Suppose that signals Si[n] are statistically independent, real or complex zero 
mean Gaussian random variables with covariance Aj > 0, and that the noise 
samples w[n\ are statistically independent, real or complex zero mean Gaussian 
random variables with covariance W. 

pisi[n]) = 7V(0,A,) (13) 
p{w[n]) = N{0,W) (14) 

Thus the parameterized model PDF of x[n] is Gaussian: 

qeix[n]) = NiO,Rg) (15) 

where: 

p 

Rg = Y^ AiUiuf + a'^W (16) 

i=l 

We will assume that the noise covariance W is known, but that all the other 
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parameters ^= (Ai, Ap,Ui, ...Up,a)^ must be estimated. For convenience 
define: 

Re = UAU" + a^W 



where: 



U 



ttl Un 



u. 



and A 



Ai 







Af 



(17) 



(18) 



Suppose there are no a priori constraints on the matrix U ^ and that the only 
constraints on A are that Aj > 0. This would typically be true if the array 
were uncalibrated, or subject to heavy unknown multipath distortion. (Note 
that because we assume an uncalibrated array, we will not be able to directly 
derive information about the direction of arrival.) Appendices A and B apply 
the CE and RCE criteria to this model. They show that the solution to these 
two problems are quite similar, and can be found by the following algorithm: 

CE and RCE BEAMFORMING ALGORITHMS 



1. Find the generalized eigenvector and eigenvalue Aj solutions to: 



(19) 



with normalization constraint ufW ^2ij = ^ij- 



2. Sort the eigenvectors and eigenvalues so that Ai > A2 > ... > Aat. Then 
the optimal structured covariance matrix approximation Rq to i? is : 



Re = «2 ••• Hp ) 



\ 



Xp - a^J 



+ a^W (20) 
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where: 



N 

j^ = Nh E i for CE 

i=p+i 

N 

a^ = j^ J2 Ai forRCE 

i=P+l 



3. The cross entropy for the optimal model is: 



N 



CE: H{qe,p)=i V \og{^) 

i=P+i 

N .2 

RCE: H{p,qe)=i ^ log(^) 

i=P+i 



(21) 



(22) 
(23) 



The estimates of and will be unique if and only if Ap > Xp+i- (The 
estimate of U will not be unique.) 

An interesting alternative form for the cross-entropy formulas can be found 
by substituting the value of from (21) into (22): 



where: 



CE: H{qe,p) = i{N - P)\og 



RCE: H(j),q6)=m -P)^og 



avg 



1 1 

Ap_|_i ' Ajv 



geo. 



\P+1 



avg 



[Ap+i,...,Aiv]geo 



(24) 

(25) 



^\ avg 
[/3p+l) ■■■■,PN\geo 



N-P ^ 

i=P+l 



(26) 
(27) 



The cross-entropies are proportional to the log of the ratio of the arithmetic 
mean to the geometric mean of the eigenvalues (or their inverses) that are not 
used in building U . The cross-entropy will therefore be positive, and will attain 
their minimum value of zero only if the geometric average of Ap+i,...,AAr (or 
their inverses) equals their arithmetic mean. This will only occur if these N — P 
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smallest generalized eigenvalues are all equal. 

Note the similarity of the RCE formula to the MDL order determination 
algorithm suggested by Wax and Kailath[2]. The RCE criterion is also strongly 
related to the Maximum Likelihood problem of estimating the structured covari- 
ance matrix given observations xi, ...,xk' 

Rg maxIogp(xi, ...,x^ | 9) (28) 


where: 

K 

Pix^,-,XK\e) =llp{x,\e) (29) 

i=l 

and: 

p{xi\e) = N{0,R) (30) 

This is because: 

H{p, qe) = ^ logp(xi, ...,Xk I e) - m + log 1^1) (31) 

Since the second term in (31) does not depend on 9, the RCE estimate of Re will 

be identical to the ML estimate. 

For the special case when the background noise is white Gaussian noise, 
W = I, the Ui must satisfy: 

= AjMj (32) 

and thus the are the eigenvectors of the observed data correlation matrix R. 
This special case is thus quite similar to that used in the MUSIC algorithm[l] 
and other similar beamforming algorithms. 

If subroutines for computing generalized eigenvectors are not available, we 
can use subroutines for computing eigenvectors of symmetric positive definite 
matrices as follows. Factor W = W^^'^W^^'^ where W^^"^ is any square root of 
W and W^^^ is its Hermitian. Then to compute the uf. 

1. Prom the whitened data correlation matrix: 



(33) 
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where W is the inverse of W^^"^. Note that R is symmetric and positive 
definite. 

2. Solve for the eigenvectors ti and corresponding eigenvalues Aj of R. 

Rti = KU (34) 

where tjij = 5i,j. Sort these so that the eigenvalues are in descending 
order. 

3. Then: 

u, = (35) 

It is also interesting to consider the effect of using the structured covariance 
matrix estimate when forming either a classical or optimal beamformer. Let 
wq be the ideal array response for a signal in a particular direction. The classical 
beamformer estimates the signal s[n\ from the array data as s[n] = Wo^x[n]. 
The expected received power from this direction is then £'[s^[n]] = Ws^ReWjo- 
Now suppose that Wg is in the space spanned by the columns of R~^U, i.e. 
wq = R^^Ua for some vector a. It is shown in Appendix A that Rq^U = R~^U. 
Therefore: 

WoReWo = a"u"R-"ReR~^Ua 

= a^U^'R-'^Ua 

= wSRm (36) 

In this case, replacing R with the structured covariance estimate Rq in the clas- 
sical beamformer makes no difference. However, if Wjq is not in the subspace 
spanned by R~^Ui, ...,R~^Up, then Rg^Wj^ 7^ R'^MLo^ using the structured 
covariance estimate in the classical beamformer will yield a different beam pat- 
tern. 

A similar statement holds for the optimum minimum variance beamformer, 
s[n] = uFx[n], which uses a window w designed such that the expected response 
energy R0W_ is minimized subject to the constraint that the response to a 
plane wave from the direction of interest is unity, w^Wq = 1. The solution is 
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m = [w]) ^o^liljd) ^ Rg^UIjd- Note that if wq is in the subspaee spanned by the 
columns of U, then there exists some vector a such that wq = Ua. Since 

Rq^Wq = Rg^Ua = R'^Ua = i?~Vo (37) 
which in turn impUes: 

w = (tw^-R^ Vo) ^ Re^m = {wx)R'^m) ^ R~^m.o (38) 

In this case, replacing R with the structured covariance estimate Rq in the op- 
timal beamformer makes no difference. However, if wq is not in the subspaee 
spanned by the columns of U, then R^^Wq ^ R~^'Wj^i and using the structured 
covariance estimate in the optimal beamformer will yield a different beam pat- 
tern. These results are contrary to the suggestion implied in [5] that replacing R 
with Rq in an optimal beamformer should make no difference. 

4. Conclusion 

In this paper, we have derived the optimal solution for correlation matrix 
estimation by the CE and RCE principles. The two methods give identical re- 
sults in the problem of estimating the sum of a low rank signal matrix plus 
noise matrix, differing only in the value of the noise level estimate. The RCE 
method gives the same results as the Maximum Likelihood approach, and when 
the noise is white, both methods are similar to MUSIC. It is interesting that the 
cross-entropy approach thus provides a unifying framework for deriving spectral 
estimation algorithm including Bartlett, MLM[8], MEM [10], and now MUSIC. 
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A Derivation of CE Beamforming Algorithm 



In this appendix wc derive the optimal structured covariance estimate using 
the CE principle. First, to simplify the effort, let us define: V = f/A^/^, where 
AV2 = diagiA^^, A]v^^). Then: 



Rn = VV^ + a'W 



(39) 



Substitute this into the CE entropy expression (6), and set the derivatives with 
respect to the real and imaginary part of every element of the V matrix, and 
with respect to cr^, to zero. Arranging these derivatives in complex matrix form 
gives: 







tr{{R-^ - R^^)W} = 



(40) 
(41) 



Using the Woodward lemma: 

R-^ = - i,w-W 



X^w-^v + 1 

(7^ 



Substituting into (40) and simplifying gives: 



v"^w-'^v + i 



-1 



(42) 



(43) 



This equation has many possible solutions. Let V refer to any one of these. 
Then let * = V^W'^V. Diagonalize by factoring it: * = Q^Q^ , where $ is 
diagonal and Q is orthonormal, Q^Q = I. Define V = VQ. Note that V is also 
a solution to (43). In fact, 



R-^V = X^W-'^V 



n -1 



and: 



(44) 



(45) 
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Let the P columns of V be vi^...,Vp, and let the P diagonal elements of $ be 
(pi, (pp. Then: 

\R'^Vi = W-^Vi (46) 

where: 

\i = ^i + (47) 

The columns of V must therefore either be zero, or else must be generalized 
eigenvector solutions to (46). Because R and W are conjugate symmetric and 
positive definite, there are always N linearly independent generalized eigenvector 
solutions vi, ...,^]v to (46), with corresponding generalized eigenvalues Ai, Ajv 
which are positive. Assume without loss of generality that the first Pq columns 
of V are non-zero, where Pq ^ P ■ These first Pq columns must be selected from 
among the N possible generalized eigenvectors, in a manner we will determine 
later. Also note that it is not necessary to estimate Q or V directly, since we can 
construct Rq directly from V: 

Rf) = vV^ + a^W 

= VQQ^V^ + a^W 

= VV" + a^W (48) 

Now to solve for a^. Substitute (42) into (41), and simplify by exploiting the 
facts that tr{AB) = tr{BA) and tr{C+D) = tr{C)+tr{D) and tr{aC) = atr{C) 
where A, B are matrices, C, D are square matrices, and a is a scalar. 
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= tr{{RQ^ - R-^)W} 



= tr< 




= tr< 




N 




AT - Po 1 
1=1 * 



(7^ 



-1 



-tr{ii:-^w} 



-1 




1 








(7^ 





-tr{R-^W} 



(49) 



where we used (45) in the fourth hne, and (47) in the fifth. This can be further 
simpUfied by noticing that if Vi is any generahzed eigenvector solution to (46), 
then: 

1 _ 1 

(50) 



WR-% = w{^w-H^ = 1^, 

A,; A,; 



Therefore, the Vi are eigenvectors of WR ^ with eigenvalues l/Aj. Thus: 



N 



tr{WR-'} = J2Y. 



1=1 

^2 



(51) 



Substituting back into (49), then solving for a gives: 

o N-Po 



a 



N 
=Po+l 



(52) 



Now substitute the solution for V and for C72 into (48), and then substitute 
this back into the formula (6) for the cross-entropy. The algebra is simplified by 
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noting that if Vi is any generalized eigenvector solution to (46) , then: 



A,; 



+ (T^)vi^ for i = 1, ....,Po 



j-cr'^Vi for i =Pq + 1,...,N 



£j for i = 1, Pq 

for i = Po + l,...,N 



(53) 



Therefore, the Vi are all eigenvectors of RgR~^. The first Pq eigenvalues are 
equal to 1, and the remainder are equal to cr^/App+i, ...jCt^/Aat- Putting all this 
together, the cross-entropy at this solution has the value: 

H{qe,p) = C{tr{R0R-^}-N -log\ReR-^} 

= ^ 



N N 2) 

i=Po+l i=Po+l J 



N 



«=Po+l 



Substituting the value of from (52) gives the alternate form: 



(54) 



Hiqe,p)=m-Po)log 



N 

N-Po , A, 

t=Po+l 



TV 

n 

>i=Po+l 



l/(Ar-Po) 



(55) 



Now to return to the issue of which of the N possible generalized eigenvector 
solutions should be used for the Pq non-zero columns of V. Let us call the 
selected Pq eigenvectors Vi,-.-,Vp^^ the "signal eigenvectors", and let us call the 
remainder the "noise eigenvectors". The signal eigenvectors satisfy 7^ 0; since 



W-^ > 0, then 



vi^W ^v, > and thus A,- 



+ a'^ > cr^ for i 
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1,...,Pq. We show that these signal cigcnvahics must be the largest eigenvalue 
solutions to (46). Suppose this were not true, so that the global optimum solution 
corresponded to an Rg such that one of the signal eigenvalues, say Ap^, was 
smaller than the largest of the noise eigenvalues, say Apy+i. Thus a'^ < \pg < 
Apg+i. But then, as we will see, swapping these eigensolutions, making Vp^-^-i 
a signal eigenvector and Vp^ a noise eigenvector will further decrease the cross- 
entropy, contradicting our assumption of global optimality. To show this, let 
i?(Ap(,+i,Apg+2,-",Aiv) represent the cross-entropy with a model Re built using 
non-zero solutions Vi, ■■■,Vp^_i,Vp^, and let (App, Ap(j+2, Ajv) represent the 
cross-entropy with a model Re built using non-zero solutions Vi, ^P(,_i, Wp„_|_i. 
Then because the cross-entropy formula (55) is an analytic function of the Aj, by 
the mean value theorem: 



-H'(Apo+i,Apo+2,..., Aat) - i7(Ap(,^Ap(,+2,---, Aat) 
dH 

— (A, Apo+2,...., AAr)|A=A(M+i - M) 



(56) 



where A is some value in the range Apg < A < Apg+i. But: 



dH _ ^1 
aA " ^3? 



> 



iV-Pn 



TV 



A ^ ^ Ai / 

i=Po+2 V 



(57) 



for all Apg < A < Apj+i, where the last line is true because: 



A > Apo 

,2 



> 



N-Pq 

Z^i=Po+l A, 
N-Pq 

A Z^i=Po+2 Ai 



(58) 



Since Ap^+i — Apg > 0, the change in (56) must be positive. Therefore, swapping 
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Vp^^ and ;Wp(,_|-i reduces the cross-entropy, and our assumed global optimum solu- 
tion cannot be globally optimum. The Pq signal eigenvalues must therefore be 
the largest eigenvalue solutions to (46), and the non-zero Pq columns of V must 
be the corresponding general eigenvectors. 

Finally, we must show that we should always choose Pq = P eigenvectors. 
Without loss of generality, let us sort all the eigenvalues Ai > A2 > ... > Ajv- Let 
Hi represent the minimum cross-entropy with i non-zero columns in V. Then 
using (55): 



Hp,-Hp,+i = C(iV-Po)log 



> 



(iV-Po) Xpo+i 



+ 



(1 (jv-Po)) i 



1 



(N-P„) / {N-Pq) 

^Pq+i J V A , 



(59) 



where j = j^_p^_i YliLpo+2 'h where we used the inequality pa + {l — p)/3 > 
a^P^^"^^ for any < p < 1 in the last line. Thus the cross-entropy decreases as 
Pq varies from to P, so the best choice for Pq must be Pq = P . 

The proof that Rq is unique when Ap > Ap+i is messy but straightfor- 
ward. The key issue is that the space spanned by the signal eigenvectors is 
uniquely determined. If there are multiple signal eigenvalues, then the eigen- 
vectors themselves may not be uniquely determined, and thus V may not be 
uniquely determined. 

We get the formulas in the text by defining U = V^~^^^ . 
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B Derivation of RCE Algorithm 

In this appendix we give the solution to the RCE problem. The derivation is 
quite similar to that for the CE problem, and therefore we present this quickly. 
With our Gaussian models, the RCE cross-entropy has the value: 

RCE: H{p,q0)=^{tr{Rg^R)-N-\og\Rg^R\} (60) 

Differentiating with respect to the real and imaginary parts of V and setting 
these to zero, as before, gives: 

{R-^RRq^ - Rg^) F = (61) 

Multiplying both sides by R~^Re gives: 

{Rq^ - R-^)V = (62) 

which is exactly the same equations which the solution for V in the CE problem 
must satisfy, (40). Therefore, we can construct Rq from (48), where the columns 
of V must be solutions to the generalized eigenvector problem (46). 

Now differentiating (60) with respect to and setting it to zero gives: 

tr{{Rg^RRg^ - Rg^)W} = 

Combining this with (61) gives: 

= tr{{R-^RR-^ - R-^){VV" + a'^W)} 
= tr{{R-^RR-^ -R^^)Re} 
= tr{Rg^R-I} (64) 

which implies that: 

tr{Rg^R} = N (65) 

But (53) implies that Rq ^R has Pq eigenvalues equal to 1, and the rest have 
values Apo_|_i/c7^, Aat/ct^. Since the trace of a matrix is just the sum of its 



(63) 
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eigenvalues: 

1 ^ 

^o + — Yl = ^ (66) 



i=Po+l 



which gives: 

N 



^^^ = -4^ E (67) 

i=Po+l 

Using the facts that the trace of a matrix is the sum of the eigenvalues, and the 
determinant is the product of the eigenvalues: 



a' 



N 2 

= ? E I^ST- (68) 

i=Po+l 



The proofs that we must choose Ai, Ap^ to be the largest eigenvalues, that we 
should choose Pq = P ^ and that the solution Rq is unique if Ap > Ap+i, are 
similar to the proofs for the CE algorithm. 
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